Logged in as: guest Log in
Lab 7 mcmillan / Version 3

 

Recursion, Debugging, and Stack Dumps

March 22, 2013


Prelab (complete before lab)

    This Lab introduces the gdb debugger. It assumes that you are familar with the use of pointers, and that you already know how to compile and execute a program.

    To illustrate debugging principles we will use an example buggy program. Each bug is documented with an associated comment. As you progress through the lab, you will use the debugger to locate and fix each one. The buggy program can be downloaded here. The code is very simple and consists of three functions and the declaration and initialization of a binary tree.

    Prior to the lab download the program, compile, and run it. The program will print out some messages, indicate a segmentation fault, and then crash. Given only the output information, it is near impossible to determine why the program crashed, much less how to fix it. You should get a result similar to the following:

    $ Lab7a
    Before:
     3: 10
     1: 30
     7: 32
     4: 35
     8: 40
     0: 42
    Segmentation fault: 11
    

    Next, fix the bugs as indicated in the comments, compile, and rerun it. This time you should get:

    $ Lab7a
    Before:
     3: 10
     1: 30
     7: 32
     4: 35
     8: 40
     0: 42
     5: 50
     9: 60
     2: 80
    10: 85
     6: 90
    11: 95
    After:
     3: 10
     1: 30
     7: 32
     4: 35
     8: 40
     0: 42
    5863771034741: 45
     5: 50
     9: 60
     2: 80
    10: 85
     6: 90
    11: 95
    

    Make sure that you revert your code to the original buggy version before starting the lab.

Part 1:

    Bugs happen. During a programmer's career there will be many occasions where she is forced to debug a program that she wrote, or one someone else wrote. There are many ways to go about debugging, from strategic use of printf()s, to just thinking about what the program is doing and making an educated guess as to what the problem is. However, there is a third option, using a debugger, which is particularly useful for tracking down subtle bugs.

    Before a bug can be fixed, the source of the bug must be located. For example, with segmentation faults, it is useful to know on which line of code the seg fault is occuring. Once the line of code in question has been found, it is useful to knowthe values of variables, who called the function, and why (specifically) the error is occuring. A debugger makes provides a means for answering all of these questions and more.

    The compiler can add supplemental information to the symbol table to aid the debugging process, this includes line numbers, and aids to access to all variables both global and local to a function (including those created on the stack). With gcc, this is accomplished using the -ggdb command line switch. Go ahead and compile the given buggy program using the following command:

    $ gcc -ggdb -o Lab7 Lab7.c

    Now we can start debugging by typing the following command:

    $ gdb Lab7
    GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
    Copyright (C) 2010 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    ...
    Reading symbols from /afs/cs.unc.edu/home/mcmillan/lab5/Lab7...done.
    (gdb)
    

    The (gdb) prompt indicates that you are running in the debugger. Next enter the 'run' command as follows:

    (gdb) run
    Starting program: /afs/cs.unc.edu/home/mcmillan/lab5/Lab7 
    Before:
     3: 10
     1: 30
     7: 32
     4: 35
     8: 40
     0: 42
    
    Program received signal SIGSEGV, Segmentation fault.
    0x000000000040050d in depthFirst (tptr=Cannot access memory at address 0x7fffff3feff8) at Lab7.c:24
    24	void depthFirst(struct node *tptr) {
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64
    

    Now that the program has crashed lets see what information we can gather. We've already found out that the program crashed while executing the depthFirst() function on line 24. We might be interested in findeing out who called the errant function. This can be found out by typing 'backtrace' at the gdb prompt as follows:

    (gdb) backtrace
    #0  0x000000000040050d in depthFirst (tptr=Cannot access memory at address 0x7fffff3feff8) at Lab7.c:24
    #1  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #2  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #3  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #4  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #5  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #6  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #7  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #8  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #9  0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #10 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #11 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #12 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #13 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #14 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #15 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #16 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #17 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #18 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #19 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #20 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #21 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    #22 0x000000000040052e in depthFirst (tptr=0x600b18) at Lab7.c:27
    ---Type  to continue, or q  to quit---
    

    The 'backtrace' command examines the stack frame of each callee, and determines its caller. This allows one to examine the sequence of called functions.

    Checkoff #1. Press return a few times to determine how deep the recursion of the depthFirst() function. Write down a guess of why the program generated the segmentation fault.

    Okay, let's start over by typing 'q', to get out of the stack dump. Then type 'quit' at the '(gdb)' prompt, and finally answering 'y'. This should return you the UNIX command prompt.

Part 2:

    Another useful debugging method is to stop and examine the program at various stages. This can be done using breakpoints just like we did using the miniMIPS simulator. Lets restart a debugging session, and set a breakpoint that will allow us to examine the 'depthFirst()' function each time it is called. The following series of commands accomplishes this:

    $ gdb Lab7
    GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
    Copyright (C) 2010 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    ...
    Reading symbols from /afs/cs.unc.edu/home/mcmillan/lab5/Lab7...done.
    (gdb) break depthFirst
    Breakpoint 1 at 0x400511: file Lab7.c, line 26.
    (gdb)
    

    Then enter the command 'run' as follows:

    (gdb) run
    Starting program: /afs/cs.unc.edu/home/mcmillan/lab5/Lab7 
    Before:
    
    Breakpoint 1, depthFirst (tptr=0x600aa0) at Lab7.c:26
    26	    if (tptr->left != NULL)
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64
    (gdb)
    

    At this point we can examine local variables and arguments passed to the function. At this point it would be useful to examine the 'node' argument passed in via the '*nptr' argument of depthFirst(). This is accomplished using the 'print' command as follows:

    (gdb) print *tptr
    $1 = {value = 42, left = 0x600ab8, right = 0x600ad0}
    (gdb)
    

    Let's continue executing the program until we reach the first printf()

    (gdb) continue
    Continuing.
    
    Breakpoint 1, depthFirst (tptr=0x600ab8) at Lab7.c:26
    26	    if (tptr->left != NULL)
    (gdb) print *tptr
    $3 = {value = 30, left = 0x600ae8, right = 0x600b00}
    (gdb) continue
    Continuing.
    
    Breakpoint 1, depthFirst (tptr=0x600ae8) at Lab7.c:26
    26	    if (tptr->left != NULL)
    (gdb) print *tptr
    $4 = {value = 10, left = 0x0, right = 0x0}
    (gdb) continue
    Continuing.
     3: 10
     1: 30
    
    Breakpoint 1, depthFirst (tptr=0x600b00) at Lab7.c:26
    26	    if (tptr->left != NULL)
    (gdb)
    

    Continue stepping through the program to the point where the 'printf()s' stop and the node pointed to by '*tptr' starts to repeat. At this point the problem is evident, a link of our binary tree is improperly forming a loop. If fact, it points to itself. This explains the repeated value of '*tptr'

    Checkoff #2. At this point, enter the 'backtrace' command and write down the resulting calling sequence.

    Now that we have tracked down the first program bug, use an editor to fix the problem and then continue to the next part.

Part 3:

    Now that the first bug is fixed, you can recompile and run it. Once more this should lead to a segmentation fault as follows:

    $ Lab7
    Before:
     3: 10
     1: 30
     7: 32
     4: 35
     8: 40
     0: 42
     5: 50
     9: 60
     2: 80
    10: 85
     6: 90
    11: 95
    Segmentation fault (core dumped)
    

    So let's re-enter the debugger to figure out the remaining problem. Start by executing 'run', then examine the 'node' referred to by '*nptr' at the time of the segmentation fault.

    % gdb Lab7
    GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
    Copyright (C) 2010 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    ...
    Reading symbols from /afs/cs.unc.edu/home/mcmillan/lab5/Lab7...done.
    (gdb) run
    Starting program: /afs/cs.unc.edu/home/mcmillan/lab5/Lab7 
    Before:
     3: 10
     1: 30
     7: 32
     4: 35
     8: 40
     0: 42
     5: 50
     9: 60
     2: 80
    10: 85
     6: 90
    11: 95
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00000000004005b1 in addNode (tptr=0x0, nptr=0x7fffffffe390) at Lab7.c:35
    35	    if (nptr->value >= tptr->value)
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64
    (gdb) print *nptr
    $1 = {value = 45, left = 0x0, right = 0x0}
    (gdb) 
    

    In this case, we also have some addtional information. The call to addNode which generated the fault, refers to a subtree (*tptr) whose address is 0. This address is reserved by the operating system, and commonly used as a 'NULL' pointer. In other words, a pointer to nothing.

    At this point we can execute 'backtrace' to see the path of calls that led to the problem.

    (gdb) backtrace
    #0  0x00000000004005b1 in addNode (tptr=0x0, nptr=0x7fffffffe390) at Lab7.c:35
    #1  0x000000000040061d in addNode (tptr=0x600b18, nptr=0x7fffffffe390)
        at Lab7.c:44
    #2  0x000000000040061d in addNode (tptr=0x600ad0, nptr=0x7fffffffe390)
        at Lab7.c:44
    #3  0x00000000004005e9 in addNode (tptr=0x600aa0, nptr=0x7fffffffe390)
        at Lab7.c:39
    #4  0x0000000000400663 in main () at Lab7.c:52
    (gdb)
    

    There is a important hint in this stack dump. In particular, the caller of the current addNode() instance (#0) must called the current instance with the 'NULL' subtree pointer. we can examine the node with the specified address by using a typecast of the address (NOTE: this address might be different on your version) as follows:

    (gdb) print *((struct node *) 0x600b18)
    $4 = {value = 50, left = 0x0, right = 0x600b78}
    

    Now we can see where the '0' or NULL value came from, the node's left child. If we re-examine the code, the bug is now obvious, a right child was specified where the left child was called for. This was was at the point where the addNode() function reached a tree leaf.

    Fix the bug using an editor. Recompile, and execute it. This time it should run to completion

    Checkoff #3. Draw a diagram of the final binary tree represented in the array 'tree' after the insertion of 'newNode1'.




Site built using pyWeb version 1.10
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics